How to use the line finder utility¶

Important: The functionality described in this notebook is not part of the current package release. To access it, you need to obtain the code from the line-finder branch and install the package from source.

This tool finds lines and extrema in the internally calibrated mean spectra. It uses an approach described in Weiler, M. et al. 2023, A&A 671, A52.

Find extrema in the internally calibrated mean spectra¶

This tool provides the list of extrema in the internally calibrated mean spectra. No evaluation of spectra in both sampled and continuous forms is performed.

In [1]:
# Import the tool
from gaiaxpy import find_fast

Input types¶

The available input types are: a pandas DataFrame, an ADQL query, a list of source IDs, and a path to a file with XP CONTINUOUS RAW data (csv, ecsv, fits, or xml).

Passing a DataFrame¶

In [2]:
import pandas as pd

f = '/path/to/XP_CONTINUOUS_RAW.csv'
df = pd.read_csv(f)

extrema_pwl = find_fast(df)
extrema_pwl.head()
Reading input DataFrame... Done!
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[2]:
source_id xp extrema
0 5853498713190525696 BP 14.874711
1 5853498713190525696 BP 16.885261
2 5853498713190525696 BP 17.522690
3 5853498713190525696 BP 31.734373
4 5853498713190525696 BP 31.944037

Running a query¶

In [3]:
query_input = "select TOP 2 source_id from gaiadr3.gaia_source where has_xp_continuous = 'True'"

extrema_pwl = find_fast(query_input)
extrema_pwl.head()
INFO: Query finished. [astroquery.utils.tap.core]
Running query... Done!
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[3]:
source_id xp extrema
0 488289544781895936 BP 15.714108
1 488289544781895936 BP 34.459567
2 488289544781895936 BP 34.594935
3 488289544781895936 BP 36.994297
4 488289544781895936 BP 38.556855

Passing a list¶

A list of sourceIds can be passed to the fastfinder as the first argument. The converter will then query the Archive for these objects.

In [4]:
sources_list = ['5853498713190525696', 5762406957886626816] # The sourceIds can be string or long.

extrema_pwl = find_fast(sources_list)
extrema_pwl.head()
Running query... Done!
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[4]:
source_id xp extrema
0 5853498713190525696 BP 14.874711
1 5853498713190525696 BP 16.885261
2 5853498713190525696 BP 17.522690
3 5853498713190525696 BP 31.734373
4 5853498713190525696 BP 31.944037

Path to a file¶

In [5]:
f = '/path/to/XP_CONTINUOUS_RAW.fits'

extrema_pwl = find_fast(f)
extrema_pwl.head() # Only the first few rows of the output are displayed when head() is used.
Reading input file... Done!
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.fits

                
Out[5]:
source_id xp extrema
0 5853498713190525696 BP 14.874711
1 5853498713190525696 BP 16.885261
2 5853498713190525696 BP 17.522690
3 5853498713190525696 BP 31.734373
4 5853498713190525696 BP 31.944037

The fast finder output:¶

The fast finder returns all found extrema as a pandas DataFrame. The DataFrame contains pseudowavelengths for extrema in BP and RP for each source, respectively.

Find extrema (and their properties) in the internally calibrated mean spectra¶

This tool looks for all extrema in the internally calibrated mean spectra. It also converts internally calibrated mean spectra from the continuous representation to an externally calibrated sampled form. The converted spectrum is used to provide basic properties of detected extrema.

In [6]:
# Import the tool
from gaiaxpy import find_extrema

Input types¶

The available input types are: a pandas DataFrame, an ADQL query, a list of sourceIds, and a path to a file with XP CONTINUOUS RAW data (csv, ecsv, fits, or xml).

Passing a DataFrame¶

In [7]:
import pandas as pd

f = '/path/to/XP_CONTINUOUS_RAW.csv'
df = pd.read_csv(f) # The values in the DataFrame can be edited if the user wishes to do so.

extrema = find_extrema(df)
extrema.head()
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[7]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 5853498713190525696 bp_330 330.020145 6.956654e-16 5.180523e-18 3.970387 0.042459 8.510702
1 5853498713190525696 bp_334 334.593994 4.164433e-16 -1.505134e-17 3.816551 0.185722 4.731364
2 5853498713190525696 bp_337 337.632124 2.309730e-16 -1.001079e-16 3.614516 1.859752 3.284514
3 5853498713190525696 bp_341 341.739965 2.065532e-16 -2.151889e-17 3.701070 0.602507 3.552140
4 5853498713190525696 bp_344 344.993479 1.072451e-16 -1.218512e-16 3.931419 3.756464 3.915424

Running a query¶

In [8]:
query_input = "select TOP 2 source_id from gaiadr3.gaia_source where has_xp_continuous = 'True'"

extrema = find_extrema(query_input)
extrema.head()
INFO: Query finished. [astroquery.utils.tap.core]
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[8]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 488289544781895936 bp_333 333.993091 -3.375660e-18 1.406177e-18 5.592031 0.571797 1.328987
1 488289544781895936 bp_340 340.467525 1.582020e-18 3.987157e-18 7.154992 3.665364 2.238799
2 488289544781895936 bp_348 348.374722 -1.012049e-18 -1.692863e-18 7.203026 1.881983 2.434976
3 488289544781895936 bp_363 363.364787 1.090649e-18 1.395584e-18 6.722927 1.704171 2.134727
4 488289544781895936 bp_370 370.041668 -1.289301e-18 -2.375571e-18 7.546684 2.635407 3.594792

Passing a list¶

A list of sourceIds can be passed to the extremafinder as the first argument. The converter will then query the Archive for these objects.

In [9]:
sources_list = ['5853498713190525696', 5762406957886626816] # The sourceIds can be string or long.

extrema = find_extrema(sources_list)
extrema.head()
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[9]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 5853498713190525696 bp_330 330.020145 6.956654e-16 5.180523e-18 3.970387 0.042459 8.510702
1 5853498713190525696 bp_334 334.593994 4.164433e-16 -1.505134e-17 3.816551 0.185722 4.731364
2 5853498713190525696 bp_337 337.632124 2.309730e-16 -1.001079e-16 3.614516 1.859752 3.284514
3 5853498713190525696 bp_341 341.739965 2.065532e-16 -2.151889e-17 3.701070 0.602507 3.552140
4 5853498713190525696 bp_344 344.993479 1.072451e-16 -1.218512e-16 3.931419 3.756464 3.915424

Path to a file¶

In [10]:
f = '/path/to/XP_CONTINUOUS_RAW.fits'

extrema = find_extrema(f)
extrema.head()
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.fits

                
Out[10]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 5853498713190525696 bp_330 330.020145 6.956654e-16 5.180523e-18 3.970387 0.042459 8.510702
1 5853498713190525696 bp_334 334.593994 4.164433e-16 -1.505134e-17 3.816551 0.185722 4.731364
2 5853498713190525696 bp_337 337.632124 2.309730e-16 -1.001079e-16 3.614516 1.859752 3.284514
3 5853498713190525696 bp_341 341.739965 2.065532e-16 -2.151889e-17 3.701070 0.602507 3.552140
4 5853498713190525696 bp_344 344.993479 1.072451e-16 -1.218512e-16 3.931419 3.756464 3.915424

The extremafinder output:¶

All found extrema and their properties are returned as a pandas DataFrame. For each detected line following properties are provided:

  • line_name,
  • wavelength [nm]: calculated from pseudo-wavelength using the GaiaXPy dispersion function,
  • flux [W/nm/m^2]: value of flux at the wavelength corresponding to the extremum at pseudo-wavelength,
  • depth [W/nm/m^2]: difference between the line flux and the flux in estimated continuum,
  • width [nm]: estimated by the distance between the closest inflection points,
  • significance: the ratio between the line depth and the flux error at the extremum in the externally calibrated spectrum,
  • sig_pwl: the ratio between the line depth and the flux error at the extremum in the internally calibrated spectrum.

Find lines (and their properties) from a provided list¶

Lines in stellar spectra¶

By defauft this tool provides a short list of lines (H_alpha, H_beta, HeI).

In [11]:
# Import the tool
from gaiaxpy import find_lines

Input types¶

The available input types are: a pandas DataFrame, an ADQL query, a list of sourceIds, and a path to a file with XP CONTINUOUS RAW data (csv, ecsv, fits, or xml).

Passing a DataFrame¶

In [12]:
import pandas as pd

f = '/path/to/XP_CONTINUOUS_RAW.csv'
df = pd.read_csv(f)

lines = find_lines(df)
lines
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[12]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 5853498713190525696 He I_2 586.310487 1.236627e-15 -5.690724e-17 22.490756 4.207830 5.056023
1 5853498713190525696 H_alpha 655.968059 4.178223e-15 1.856800e-15 14.492901 29.068717 58.355642
2 5853498713190525696 He I_3 704.156775 4.863446e-15 1.879177e-15 14.016893 18.544342 15.292537
3 5762406957886626816 H_beta 484.420302 2.409925e-16 -7.701987e-17 19.829839 30.636190 52.906350
4 5762406957886626816 H_alpha 655.954918 8.142886e-17 -2.154309e-17 14.190875 19.575013 34.208743
5 5762406957886626816 He I_3 707.165598 7.877543e-17 1.357762e-18 11.660292 1.981938 2.918542

Passing a list¶

A list of sourceIds can be passed to the extremafinder as the first argument. The converter will then query the Archive for these objects.

In [13]:
sources_list = ['5853498713190525696', 5762406957886626816] # The source IDs can be either strings or long integers.

lines = find_lines(sources_list)
lines
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[13]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 5853498713190525696 He I_2 586.310487 1.236627e-15 -5.690724e-17 22.490756 4.207830 5.056023
1 5853498713190525696 H_alpha 655.968059 4.178223e-15 1.856800e-15 14.492901 29.068717 58.355642
2 5853498713190525696 He I_3 704.156775 4.863446e-15 1.879177e-15 14.016893 18.544342 15.292537
3 5762406957886626816 H_beta 484.420302 2.409925e-16 -7.701987e-17 19.829839 30.636190 52.906350
4 5762406957886626816 H_alpha 655.954918 8.142886e-17 -2.154309e-17 14.190875 19.575013 34.208743
5 5762406957886626816 He I_3 707.165598 7.877543e-17 1.357762e-18 11.660292 1.981938 2.918542

Path to a file¶

In [14]:
f = '/path/to/XP_CONTINUOUS_RAW.fits'

lines = find_lines(f)
lines
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.fits

                
Out[14]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 5853498713190525696 He I_2 586.310487 1.236627e-15 -5.690724e-17 22.490756 4.207830 5.056023
1 5853498713190525696 H_alpha 655.968059 4.178223e-15 1.856800e-15 14.492901 29.068718 58.355641
2 5853498713190525696 He I_3 704.156775 4.863446e-15 1.879177e-15 14.016893 18.544343 15.292538
3 5762406957886626816 H_beta 484.420302 2.409925e-16 -7.701987e-17 19.829839 30.636189 52.906350
4 5762406957886626816 H_alpha 655.954918 8.142886e-17 -2.154309e-17 14.190875 19.575013 34.208743
5 5762406957886626816 He I_3 707.165598 7.877543e-17 1.357762e-18 11.660292 1.981938 2.918542

The linefinder output:¶

All found lines and their properties are returned as a pandas DataFrame. For each detected line following properties are provided:

  • line_name,
  • wavelength [nm]: calculated from pseudo-wavelength using the GaiaXPy dispersion function,
  • flux [W/nm/m^2]: value of flux at the wavelength corresponding to the extremum at pseudo-wavelength,
  • depth [W/nm/m^2]: difference between the line flux and the flux in estimated continuum,
  • width [nm]: estimated by the distance between the closest inflection points,
  • significance: the ratio between the line depth and the flux error at the extremum in the externally calibrated spectrum,
  • sig_pwl: the ratio between the line depth and the flux error at the extremum in the internally calibrated spectrum.

Lines in QSO spectra¶

It is possible to change a source type to QSO and provide redshift(s). By defauft this tool provides a short list of lines (H_alpha, H_beta, CIV, CIII], MgII, Ly_alpha).

In [15]:
# A list of sourceIds
sources_list = [858200268935710208, 3937755935439564672]
# A list with redshifts
zets = [(858200268935710208, 0.06107), (3937755935439564672, 2.698)]

lines = find_lines(sources_list, source_type='qso', redshift=zets)
lines
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[15]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 858200268935710208 H_beta 519.826376 3.409607e-17 1.465183e-17 26.649797 13.423813 25.802867
1 858200268935710208 H_alpha 696.920068 6.136522e-17 5.011845e-17 16.417239 38.716931 66.964718
2 3937755935439564672 Ly_alpha 453.975163 2.954658e-18 1.917204e-18 16.403438 5.230892 12.308939
3 3937755935439564672 C IV 570.721363 1.114978e-18 5.137242e-19 26.334381 5.040652 6.930669
4 3937755935439564672 C III] 710.276640 7.914149e-19 3.288256e-19 11.389798 2.849785 4.142953

Other options¶

Additional arguments can be passed to the tools.

These are:

  1. truncation
  2. user_lines
  3. plot_spectra
  4. save_plots
  5. output_path
  6. output_file
  7. output_format
  8. save_file

Truncation¶

The source mean BP/RP spectrum is described as a combination of basis functions. Particularly for faint sources or sources with a low number of observations, it is useful to represent the spectrum using a smaller set of basis functions to avoid higher-order bases fitting the noise in the observed data.

The truncation parameter is a boolean which toggles the truncation of the set of bases.

In [16]:
# A list of sourceIds
sources_list = [5853498713190525696, 5762406957886626816]

extrema_pwl_tr = find_fast(sources_list, truncation=True)
extrema_pwl_tr.head()
Running query... Done!
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[16]:
source_id xp extrema
0 5853498713190525696 BP 14.874719
1 5853498713190525696 BP 16.885232
2 5853498713190525696 BP 17.522827
3 5853498713190525696 BP 31.731594
4 5853498713190525696 BP 31.946459

User's list of lines¶

An independent set of lines can be provided in a form of a list or a file. The lines can be provided to the parameter user_lines. This option is available for the linefinder tool only.

In [17]:
# A list of sourceIds
sources_list = [5853498713190525696, 5762406957886626816]
# A list with user's lines: wavelengths [nm] and names
new_lines = [(434.0472, 410.1734), ('H_gamma', 'H_delta')]

lines = find_lines(sources_list, user_lines=new_lines)
lines
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[17]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 5762406957886626816 H_gamma 431.162668 3.534201e-16 -7.068856e-17 13.200787 12.131214 33.215733
In [18]:
# Path to file with lines
f = '/path/to/lines_example.txt'

lines = find_lines(sources_list, user_lines=f)
lines
Preparing required internal data...
                                                                                                                                                                                                                               0/2 [00:00<?, ?spec/s]
Done! Output saved to path: ./output_lines.csv

                
Out[18]:
source_id line_name wavelength_nm line_flux depth width significance sig_pwl
0 5853498713190525696 H_alpha 655.968059 4.178223e-15 1.856800e-15 14.492901 29.068717 58.355642
1 5762406957886626816 H_beta 484.420302 2.409925e-16 -7.701987e-17 19.829839 30.636190 52.906350
2 5762406957886626816 H_alpha 655.954918 8.142886e-17 -2.154309e-17 14.190875 19.575013 34.208743

Note: The file should contain two columns separated by spaces: the first for wavelengths [nm] and the second for line names, with no header.

E.g.:

656.461 H_alpha
486.268 H_beta

Plotting spectra¶

It is possible to plot spectra with marked detected extrema or lines by using a boolean parameter plot_spectra. This option is available for the linefinder and extremafinder tools only.

In [19]:
lines = find_lines([5762406957886626816], plot_spectra=True)
Preparing required internal data...
Finding lines:   0%|                                                                                                                                                                                   | 0/1 [00:00<?, ?spec/s]
                                                                                                                                                                                                                               01<00:00,  1.28s/spec]
Done! Output saved to path: ./output_lines.csv

                

The additional boolean parameter save_plots is a boolean that tells the program whether to save the plots.

In [20]:
lines = find_lines([5853498713190525696], plot_spectra=True, save_plots=True)
Preparing required internal data...
                                                                                                                                                                                                                               02<00:00,  2.85s/spec]
Done! Output saved to path: ./output_lines.csv

Output_path, output_file, output_format, save_file¶

Three parameters: output_path, output_file, and output_format define the entire path of the resulting file.

The default output path is the current path. If the given output path does not exist, it will be created.

The default output file name is 'output_lines'.

The default output format is the same as the format of the input file (i.e. if the input file format is 'fits', then by default, the output file will be a FITS file.), or CSV in any other case (DataFrame, ADQL query or list).

In [21]:
lines = find_lines([5853498713190525696], output_path='.', output_file='my_file', output_format='ecsv')
Preparing required internal data...
                                                                                                                                                                                                                               0/1 [00:00<?, ?spec/s]
Done! Output saved to path: ./my_file.ecsv

                

The additional parameter save_file is a boolean that tells the program whether to save the results or not. If output_file is given but save_file is set to False, a warning will be raised.

In [22]:
lines = find_lines([5853498713190525696], output_path='.', output_file='my_file', output_format='ecsv', save_file=False)
Preparing required internal data...
                                                                                                                                                                                                                               0/1 [00:00<?, ?spec/s]